sand's underlying graph implementation is igraph. igraph offers several ways to load data, but sand provides a few convenience functions that simplify the workflow:
In [1]:
import sand
csv_to_dictscsv_to_dicts reads a CSV into a list of Python dictionaries. Each column in the CSV becomes a corresponding key in each dictionary.
Let's load a CSV with function dependencies in a Clojure library from lein-topology into a list of Dictionaries:
In [2]:
edgelist_file = './data/lein-topology-57af741.csv'
edgelist_data = sand.csv_to_dicts(edgelist_file,header=['source', 'target', 'weight'])
edgelist_data[:5]
Out[2]:
In [3]:
functions = sand.from_edges(edgelist_data)
functions.summary()
Out[3]:
from_vertices_and_edges with two lists of dictionariesA richer network model includes attributes on the vertex and edge collections, including unique identifiers for each vertex.
We can use Jupyter's cell magic to generate some sample data. Here we'll represent a network of students reviewing one another's work. Students (vertices) will be in people.csv and reviews (edges) will be in reviews.csv:
In [4]:
people_file = './data/people.csv'
In [5]:
%%writefile $people_file
uuid,name,cohort
6aacd73c-0be5-412d-95a3-ca54149c9952,Mark Taylor,Day 1 - Period 6
5205741f-3ea9-4c30-9c50-4bab229a51ce,Aidin Aslani,Day 1 - Period 6
14a36491-5a3d-42c9-b012-6a53654d9bac,Charlie Brown,Day 1 - Period 2
9dc7633a-e493-4ec0-a252-8616f2148705,Armin Norton,Day 1 - Period 2
In [6]:
review_file = './data/reviews.csv'
In [7]:
%%writefile $review_file
reviewer_uuid,student_uuid,feedback,date,weight
6aacd73c-0be5-412d-95a3-ca54149c9952,14a36491-5a3d-42c9-b012-6a53654d9bac,Awesome work!,2015-02-12,1
5205741f-3ea9-4c30-9c50-4bab229a51ce,9dc7633a-e493-4ec0-a252-8616f2148705,WOW!,2014-02-12,1
We again load this data into Lists of Dictionaries with csv_to_dicts:
In [8]:
people_data = sand.csv_to_dicts(people_file)
people_data
Out[8]:
In [9]:
review_data = sand.csv_to_dicts(review_file)
review_data
Out[9]:
In [10]:
reviews = sand.from_vertices_and_edges(
vertices=people_data,
edges=review_data,
vertex_name_key='name',
vertex_id_key='uuid',
edge_foreign_keys=('reviewer_uuid', 'student_uuid'))
reviews.summary()
Out[10]:
In [11]:
reviews.vs['indegree']
Out[11]:
In [12]:
reviews.vs['outdegree']
Out[12]:
In [13]:
reviews.vs['label']
Out[13]:
In [14]:
reviews.vs['name']
Out[14]:
Groups represent modules or communities in the network. Groups are based on the labels by default.
In [15]:
reviews.vs['group']
Out[15]:
The vertices in the lein topology data set contain fully-qualified namespaces for functions. Grouping by name isn't particularly useful here:
In [16]:
len(set(functions.vs['group']))
Out[16]:
In [17]:
len(functions.vs)
Out[17]:
Because sand was build specifically for analyzing software and system networks, a fqn_to_groups grouping function is built in:
In [18]:
functions.vs['group'] = sand.fqn_to_groups(functions.vs['label'])
In [19]:
len(set(functions.vs['group']))
Out[19]:
This is a much more managable number of groups. We'll see one way that these groups are useful when we render a visualization of the network: